- Title
- Source code plagiarism detection in the presence of pervasive plagiarism-hiding source code modifications
- Creator
- Cheers, Hayden John
- Relation
- University of Newcastle Research Higher Degree Thesis
- Resource Type
- thesis
- Date
- 2021
- Description
- Research Doctorate - Doctor of Philosophy (PhD)
- Description
- Source code similarity is a well-studied area of software engineering. One notable area applied to education is the detection of source code plagiarism from undergraduate computing students. Many prior works have proposed automated source code plagiarism detection tools to identify indications of source code plagiarism in undergraduate programming assignments. However, there are three important problems with existing works on the development and evaluation of source code plagiarism detection tools. Firstly, the evaluations of source code plagiarism detection tools are commonly not reproducible. Secondly, source code plagiarism detection tools do not indicate what assignment submissions are suspicious of plagiarism. Thirdly, there are no comprehensive studies evaluating the impact of source code modifications used to hide plagiarism on source code plagiarism detection tools. The work in this thesis is designed to initially address these three problems, and proceeds to propose a novel source code plagiarism detection tool that is more robust and accurate than existing tools. Firstly, evaluations of source code plagiarism detection tools are not reproducible as evaluation data sets are not released, and proposed source code plagiarism detection tools are not made available for reuse. Neither of these factors can be directly addressed. However, to present a solution to this problem, this work presents tools for the automatic generation of source code plagiarism detection tool evaluation data sets, and a pipeline that facilitates the automated evaluation of source code plagiarism detection tools. This is to afford a semi-automatic and reproducible method of evaluating source code plagiarism detection tools. Secondly, an approach for identifying assignment submissions suspicious of plagiarism is presented. The approach applies clustering to identify similar groups of assignment submissions with similar source code similarity scores. The relations between clustered scores are analysed and used to identify groups of assignment submissions that are suspicious of plagiarism. This then affords a semi-automatic method of suggesting groups of students that are suspected of plagiarising in their assignment submissions. Thirdly, an empirical evaluation of source code plagiarism detection tools pervasive against source code modifications representative of undergraduate plagiarisers is presented. This evaluation measures the performance of available source code plagiarism detection tools against a selection of 14 source code transformations, and the injection of 4 different fragment types of source code. The results of this evaluation indicate that existing source code plagiarism detection tools are not robust against pervasive plagiarism-hiding source code modifications, and as a result can suffer from poor accuracy. Finally, in order to address the identified poor robustness and accuracy of existing source code plagiarism detection tools identified in the empirical evaluation, this work presents the design and evaluation of a novel source code plagiarism detection tool. The presented source code plagiarism detection tool identifies indications of plagiarism by analysing the runtime behaviour of assignment submissions. This approach is then demonstrated to be both more robust and accurate against currently available source code plagiarism detection tools.
- Subject
- source code plagiarism detection; source code similarity; behavioural similarity; plagiarism
- Identifier
- http://hdl.handle.net/1959.13/1430168
- Identifier
- uon:38806
- Rights
- Copyright 2021 Hayden John Cheers
- Language
- eng
- Full Text
- Hits: 794
- Visitors: 1500
- Downloads: 767
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT01 | Thesis | 2 MB | Adobe Acrobat PDF | View Details Download | ||
View Details Download | ATTACHMENT02 | Abstract | 162 KB | Adobe Acrobat PDF | View Details Download |